STmut: a framework for visualizing somatic alterations in spatial transcriptomics data of cancer

Spatial transcriptomic technologies, such as the Visium platform, measure gene expression in different regions of tissues. Here, we describe new software, STmut, to visualize somatic point mutations, allelic imbalance, and copy number alterations in Visium data. STmut is tested on fresh-frozen Visium data, formalin-fixed paraffin-embedded (FFPE) Visium data, and tumors with and without matching DNA sequencing data. Copy number is inferred on all conditions, but the chemistry of the FFPE platform does not permit analyses of single nucleotide variants. Taken together, we propose solutions to add the genetic dimension to spatial transcriptomic data and describe the limitations of different datatypes. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-023-03121-6.


Allelic Imbalance
(major allele fraction minus 0.5)  Copy number alterations were inferred over individual bins across the genome (grey data points) and segmented (gold lines) as described.Bottom panel: Allelic imbalance was inferred over germline heterozygous SNPs (grey data points) and segmented (gold lines) as described.In this plot, allelic imbalance equates to the major allele fraction minus 0.5.For example, a 50% to 50% or 60% to 40% ratio of reads, mapping to each allele of a heterozygous SNP, would respectively have imbalance values of 0.0 or 0.1 (i.e. the deviation from the expected fraction of 0.5).Overall, there were no compelling signals of copy number alterations or loss of heterozygosity in the patient 4 tumor, whereas several copy number alterations (noted) were present in the patient 6 tumor.4,096-8,191 8,192-16,383 16,384-32,767 32,768-65,364 >65,364 Non Tissue

G A A C A G C A G A A G G C A C G A G T C C T G A A G A T G G A A G A A A
Total reads (exponential scale) A.   Red tiles register when a mutation was present in a given spot.White tiles denote when a mutation was likely absent (high coverage over the reference allele without any mutant reads).Grey tiles had insufficient coverage to make a mutation call (light grey) or no coverage whatsoever (dark grey).B. Clonal analysis was difficult because of the large amount of missing data, however, a subset of spots had 2 or more mutations, and a subset of mutations occurred in 2 or more spots.We presume that these mutations are linked, and the spots, in which they occur, are from the same clone.There was one group of linked mutations/spots, though we may have been underpowered to detect subclones below a certain threshold.The spots in normal tissue, indicated below the tiling plot, were part of the dominant clone.C. The localization of "linked" and "unlinked" spots.
The small number of "unlinked" spots are probably from the same clone, given that they occupy a similar spatial footprint.There were two main gene expression clusters (leading edge and interior), and "linked" spots occur in both clusters.D.
Stacked barplot shows the proportion of spots from each gene expression cluster with mutant reads, confirming that the main clone spans both of the major gene expression clusters.Possibly Tumor (0.25 < q < 0.5) Unclear Identity (0.5 < q but CNVscore > 0) Unlikely Tumor (CNVscore < 0) . An enrichment of spots with copy number alterations.Copy number alterations were inferred from each spot's RNA-sequencing data, as described.A "score" was calculated (see methods) to reflect how similar the RNA-inferred copy number profile of each spot was to the DNA-inferred copy number profile of the bulk tumor shown in Figure S1 (spots with a higher score had copy number profiles that mirrored the DNA-inferred profiles).Scores were also calculated on permuted data to indicate the spectrum of scores that could occur by random chance.In panel A, histograms of scores are shown for permuted data, for all spots, and for subsets of spots, as indicated.Note that the permuted scores are centered at 0 while the observed histogram is shifted to the right.The rightward shift is driven by spots from gene expression clusters that are thought to derive from tumor cells, as shown in the subsetted data.In panel B, a quantile-quantile (Q-Q) plot compares the observed scores to equivalent quantiles from the permuted data.Note the off-diagonal shift, confirming the skew in observed data towards higher scores.False discovery rates were calculated by comparing the frequency of permuted scores (false positives) to observed scores (total positives) and used to threshold the spots into 4 categories --"Likely Tumor", "Possibly Tumor", "Unclear Identity", or "Unlikely Tumor".

Figure S1 .
Figure S1.Copy number alterations and allelic imbalances in tumors from patients 4 and 6.A-B.Exome sequencing of DNA from bulk tumor tissue was performed from patients 4 (panel A) and 6 (panel B).Top panel:Copy number alterations were inferred over individual bins across the genome (grey data points) and segmented (gold lines) as described.Bottom panel: Allelic imbalance was inferred over germline heterozygous SNPs (grey data points) and segmented (gold lines) as described.In this plot, allelic imbalance equates to the major allele fraction minus 0.5.For example, a 50% to 50% or 60% to 40% ratio of reads, mapping to each allele of a heterozygous SNP, would respectively have imbalance values of 0.0 or 0.1 (i.e. the deviation from the expected fraction of 0.5).Overall, there were no compelling signals of copy number alterations or loss of heterozygosity in the patient 4 tumor, whereas several copy number alterations (noted) were present in the patient 6 tumor.

Figure S2 .
Figure S2.A splicing-site mutation affecting UBXN1 is detectable in DNA-and RNA-sequencing data.Sequencing reads from exome sequencing of tumor DNA, exome sequencing of reference DNA, and RNA-sequencing of spatially barcoded cDNAs are visualized using the Integrative Genomics Viewer (IGV) browser.Reads are shown at gene-and exon-level of resolution, as indicated.Within each dataset, the upper track shows relative sequencing coverage and the lower track shows individual sequencing reads.Variant reads exceeding 10% allele frequency is colored.Note the bias in sequencing coverage toward the 3' end of the gene in the spatial transcriptomic data.Also note how the mutant allele fails to properly splice.

Figure S4 .
Figure S4.Clonal structure of somatic mutations in a cutaneous squamous cell carcinoma.A. Tiling plots show the distribution of mutations (rows) across spots (columns) from the spatial transcriptomic data.Red tiles register when a mutation was present in a given spot.White tiles denote when a mutation was likely absent (high coverage over the reference allele without any mutant reads).Grey tiles had insufficient coverage to make a mutation call (light grey) or no coverage whatsoever (dark grey).B. Clonal analysis was difficult because of the large amount of missing data, however, a subset of spots had 2 or more mutations, and a subset of mutations occurred in 2 or more spots.We presume that these mutations are linked, and the spots, in which they occur, are from the same clone.There was one group of linked mutations/spots, though we may have been underpowered to detect subclones below a certain threshold.The spots in normal tissue, indicated below the tiling plot, were part of the dominant clone.C. The localization of "linked" and "unlinked" spots.The small number of "unlinked" spots are probably from the same clone, given that they occupy a similar spatial footprint.There were two main gene expression clusters (leading edge and interior), and "linked" spots occur in both clusters.D. Stacked barplot shows the proportion of spots from each gene expression cluster with mutant reads, confirming that the main clone spans both of the major gene expression clusters.

Figure S7 .
Figure S7.An enrichment of spots with copy number alterations from FFPE-Visium tumors.CNVscores were calculated from individual spots and permuted data (see methods).Histograms of CNVscores for permuted data or subsets of spots are shown alongside QQ plots comparing the CNVscores of the permuted data to the observed data (see Fig. S5 for a full description of these plots).This data was used to calculate false discovery rates at different CNVscores and to bin spots into categories: Likely tumor, possibly tumor, unclear identity, and not tumor.A. Data from a cutaneous squamous cell carcinoma adjacent to an actinic keratosis (case BB05).B. Data from a cutaneous squamous cell carcinoma adjacent to an actinic keratosis (case BB09).C. Data from a melanoma adjacent to a nevus (case Patient76).

Rep1 Rep2 Combined B. Figure S3. An excess of mutant reads in histologically benign tissue. Background
B. The table on the left summarizes the surface area, number of mutant spots, and mutant spot density in each region of the two replicates as well as the combined data from the two replicates.The bar graph specifically highlights the mutation density in the benign tissue versus background (non-tissue spots) from the combined data.Error bars represent 95% confidence intervals using the Poisson test.